Effect of utilizing either a self-reported questionnaire or administrative data alone or in combination on the findings of a randomized controlled trial of the long-term effects of antenatal corticosteroids

Introduction A combination of self-reported questionnaire and administrative data could potentially enhance ascertainment of outcomes and alleviate the limitations of both in follow up studies. However, it is uncertain how access to only one of these data sources to assess outcomes impact study findings. Therefore, this study aimed to determine whether the study findings would be altered if the outcomes were assessed by different data sources alone or in combination. Methods At 50-year follow-up of participants in a randomized trial, we assessed the effect of antenatal betamethasone exposure on the diagnosis of diabetes, pre-diabetes, hyperlipidemia, hypertension, mental health disorders, and asthma using a self-reported questionnaire, administrative data, a combination of both, or any data source, with or without adjudication by an expert panel of five clinicians. Differences between relative risks derived from each data source were calculated using the Bland-Altman approach. Results There were 424 participants (46% of those eligible, aged 49 years, SD 1, 50% male). There were no differences in study outcomes between participants exposed to betamethasone and those exposed to placebo when the outcomes were assessed using different data sources. When compared to the study findings determined using adjudicated outcomes, the mean difference (limits of agreement) in relative risks derived from other data sources were: self-reported questionnaires 0.02 (-0.35 to 0.40), administrative data 0.06 (-0.32 to 0.44), both questionnaire and administrative data 0.01 (-0.41 to 0.43), and any data source, 0.01 (-0.08 to 0.10). Conclusion Utilizing a self-reported questionnaire, administrative data, both questionnaire and administrative data, or any of these sources for assessing study outcomes had no impact on the study findings compared with when study outcomes were assessed using adjudicated outcomes.


Introduction
Data linkage to routinely collected data has numerous advantages for health researchers [1].It can provide a high response rate, data on a range of outcomes in large studies, and comprehensive information on hard-to-reach sub-populations, all with less burden for participants and potentially lower costs for researchers compared to self-reported questionnaires [2][3][4].Furthermore, data linkage enables the study of low-prevalence exposure-disease associations with comprehensive follow-up and continuous data collection [5], and it is an acceptable method among research ethics boards and patients [6].One notable advantage of outcomes derived from routinely collected data is the implementation of a formal blinded outcome assessment [7].This is because those responsible for collecting routine data are generally separate from the researchers making use of these data.Although blinded in-person assessments may enhance case ascertainment [8], this might not be as practical as data linkage in follow-up studies [9].The potential reduction in costs with data linkage could also enhance the feasibility of extending clinical trial follow up [10], which may enable researchers to identify long-term benefits or harms of interventions [3], especially in countries where individuals can be uniquely identified by a national patient identifier.
However, using administrative datasets also presents some challenges.The misclassification of outcomes or the absence of information could diminish the accuracy of treatment effect estimations [11].Clinical trials that utilize routinely collected data for determining outcomes commonly show smaller treatment benefits compared to traditional trials that do not rely on such routinely collected data [10].Applying for data can be a difficult and time consuming process [12].
The self-reported questionnaire is an effective technique that also can provide reliable information about participants' clinical outcomes [13].Several studies have found a high level of agreement between self-reported questionnaire data and administrative datasets for health outcomes [14,15] However, the accuracy and reliability of self-reported data can be affected by many factors, including participants' capacity to recall diagnoses, their willingness to provide medical information, and the intricacy of the condition being reported [16][17][18].Moreover, employing self-reported questionnaires can be resource-intensive and financially burdensome, particularly in the context of extensive research endeavours [19].
In principle, the combination of self-reported outcomes with administrative datasets could enhance outcome ascertainment and alleviate the limitations of both.
We have investigated the use of these combined data sources in undertaking a 50-year follow-up of a randomized trial of antenatal corticosteroids.Although corticosteroids were shown in that trial to reduce perinatal morbidity and mortality, animal studies have reported long-term adverse effects of antenatal corticosteroid exposure on the offspring, including higher blood pressure in rats and sheep [20][21][22], and increased basal insulin-to-glucose ratio in sheep [23].Human observational studies have also reported that children exposed to antenatal corticosteroids had higher blood pressures and a higher incidence of mental disorders compared to those not exposed to corticosteroids [24,25].There have been concerns about additional cardiovascular risk factors after antenatal corticosteroid exposure, but there is limited evidence from randomized trials [26].We therefore undertook a follow-up study of survivors whose mothers participated in the Auckland Steroid Trial, the first and therefore oldest randomized trial of antenatal corticosteroids [27].We found that there were no clinically important differences between corticosteroid and placebo exposed participants at 50 years of age [28].
As part of that follow-up study, we showed that record linkage to routinely collected administrative data could not replace self-reported questionnaire data but rather, the two data sources were additive.Use of both sources increased case ascertainment and may therefore increase the power for detection of differences between randomized groups [29].The primary analysis utilized adjudicated outcomes (any data source after adjudication by the expert panel), incorporating records from all data sources.However, in some studies, researchers have access to only one of the data sources to assess study outcomes.Therefore, this study aimed to determine whether the study findings would be altered if the outcomes were assessed by different data sources alone or in combination.

Methods
The Auckland Steroid Trial, conducted in New Zealand from 1969 to 1974, was a randomized placebo controlled trial that aimed to prevent neonatal respiratory distress syndrome by administering antenatal betamethasone [27].The trial was not registered as it was conducted before clinical trial registries were initiated.We traced the adult offspring of mothers who had been enrolled in the trial, requesting their participation through a self-reported questionnaire completion and consent to data linkage [28].We obtained written consent from the participants in the study.The study was reviewed and approved by the Northern A Health and Disability Ethics Committee of New Zealand (IRB00008714).For this study, 03/03/2021 was the start date for obtaining the first self-reported questionnaire, and 31/05/2022 was the end date for receiving the last self-reported questionnaire.Final linked data were provided from various agencies between 04/04/2023 and 16/05/2023.Individuals provided informed written consent to data linkage and a unique national patient identifier for each participant was submitted to each agency providing the data linkage.This identifier was returned with the data so that it could be integrated with the other individual patient level data.Individuals could be identified by their national health information (NHI) number.The outcomes of interest were diabetes mellitus, pre-diabetes, total diabetes, hyperlipidemia, high blood pressure, mental health disorders, and asthma (S1 Appendix).We compared the relative risk of each outcome between the betamethasone and placebo groups, utilizing various data sources either individually or in combination to assess study outcomes.

Data sources
The possible data sources for each outcome were the self-reported questionnaire data from the Auckland Steroid Trial follow-up study [28], linked administrative datasets (Table 1), both the questionnaire and administrative data (cases identified by both sources), either the questionnaire or administrative data (any data source), or adjudicated outcomes.
For all administrative datasets, the absence of any confirmatory evidence for a specific condition was assumed to represent no evidence for that condition.If discrepancies were observed between the questionnaire and administrative data, or if evidence was available from a single administrative dataset, a consensus on the diagnosis was reached by an expert panel consisting of the five clinician members of the study steering group who reviewed all records including self-report data (adjudicated outcome) [29].For direct comparability administrative data were right censored at date of completion of the self-reported questionnaire.

Statistical analyses
We calculated descriptive statistics (mean (SD), n (%)) and also calculated adjusted relative risks (aRR) and 95% confidence intervals (CI) using generalized linear mixed modelling adjusting for sex and gestational age at trial entry [28] to fit a binomial distribution with a log link function for robust standard error estimates.We compared the relative risks from each data source to those assessed using adjudicated outcomes using the test of interaction [31].Relative risks (95% CIs) are presented in forest plots.We also calculated the mean difference and the limits of agreement between relative risks derived from each data source to those from adjudicated outcomes using the Bland-Altman approach [32].Data analysis was conducted using SAS (v9.4 SAS Institute Inc, Cary NC).

Results
Of the 424 participants in the follow-up study (46% of those eligible), 415 (98%) completed a questionnaire, 420 (99%) consented to at least one administrative dataset, and 379 (89%) consented to all administrative data sources [29].The mean age was 49 years (SD 1) in both the betamethasone (n = 229) and placebo groups (n = 195) (Table 2).The proportion of males was higher in the betamethasone group (124/229, 54.1%) than in the placebo group (88/195, 45.1%), but the proportion of preterm births, participants currently living overseas, consent for data linkage, questionnaire completion, and availability of administrative data were similar in both groups (Table 2).The self-reported questionnaire response rates were: 97% for diabetes, 98% for pre-diabetes, 97% for total diabetes, 96% for hyperlipidemia, 96% for high blood pressure, 96% for mental health disorders, and 97% for asthma.Haemoglobin A1c (HbA1c), plasma glucose concentration for diabetes.Total cholesterol, LDL, and triglyceride concentrations for hyperlipidemia.
Self-reported questionnaire Included questions about chronic conditions, medical events, and mental health based on the New Zealand Health Survey [30].
Participants were asked if they had ever been told by a doctor that they had specific diagnoses and what treatment they had received.
*National Health Information https://doi.org/10.1371/journal.pone.0308414.t001 Using adjudicated outcomes, there was no difference in risk between participants exposed to betamethasone and those exposed to placebo in the diagnosis of diabetes, pre-diabetes, total diabetes, hyperlipidemia, high blood pressure, mental health disorders, or asthma (Table 3, Figs 1 and 2).There were also no differences in the outcomes between participants exposed to betamethasone and those exposed to placebo when outcomes were assessed using the self-reported questionnaire (mean difference in aRR 0.02, limits of agreement: -0.35 to 0.40, Table 3, Figs 1 and  2).The relative risks for the comparison between betamethasone and placebo groups for all outcomes assessed using adjudicated outcomes had similar magnitude to outcomes assessed using the self-reported questionnaire.The confidence intervals for estimates of treatment effects were slightly wider when using the self-reported questionnaire alone for all outcomes, except for diabetes where the confidence interval was narrower, compared to when using adjudicated outcomes (Table 3, Figs 1 and 2).
Similarly, the study outcomes did not differ between treatment groups when outcomes were assessed using administrative datasets (mean difference in aRR 0.06, limits of agreement: -0.32 to 0.44, Table 3, Figs 1 and 2).However, the risk of diabetes was non-significantly lower in the betamethasone group than in the placebo group when using adjudicated outcomes (aRR = 0.74, 95% CI [0.33, 1.64], P = 0.43), and non-significantly higher when using administrative datasets (aRR = 1.10, 95% CI [0.55, 2.18], P = 0.83), although confidence intervals showed substantial overlap (Fig 1 , Table 3).Conversely, the risk of pre-diabetes was non-significantly higher in the betamethasone group than in the placebo group when using adjudicated outcomes (aRR = 1.13, 95% CI [0.56, 2.26], P = 0.67) but non-significantly lower when using administrative datasets (aRR = 0.87, 95% [CI 0.50, 1.53], P = 0.68), although again there was substantial overlap in the confidence intervals (Fig 1 , Table 2).The confidence intervals  were slightly wider when using administrative datasets alone for all outcomes, except for prediabetes where the confidence interval was narrower, compared to when using adjudicated outcomes (Table 3, Figs 1 and 2).
There was no difference in the outcomes between treatment groups when outcomes were assessed using both the questionnaire and administrative data (mean difference in aRR 0.01, limits of agreement: -0.41 to 0.43, Table 3, Figs 1 and 2).However, the risk of pre-diabetes was non-significantly higher in the betamethasone group than in the placebo group when using adjudicated outcomes (aRR = 1.13, 95% CI [0.56, 2.26], P = 0.67), but non-significantly lower and with wider CI when using both the questionnaire and administrative data (aRR = 0.79, 95% CI [0.14, 4.43], P = 0.64), although confidence intervals showed substantial overlap (Fig 1, Table 3).
The study outcomes did not differ between treatment groups when outcomes were assessed using any data source (mean difference in aRR 0.01, limits of agreement: -0.08 to 0.10, Table 3, Figs 1 and 2).The relative risks for comparison between betamethasone and placebo groups for all outcomes assessed using adjudicated outcomes had similar magnitude to outcomes assessed using any data source.
Comparison of aRRs assessed using questionnaire data, administrative datasets, both questionnaire and administrative data, or any data source with those calculated using adjudicated outcomes showed aRRs were not significantly different for all outcomes (Table 4).aRRs assessed using questionnaire data were also not different from those assessed using any administrative dataset.

Discussion
We aimed to determine the effect of using different data sources alone or in combination, on the findings of a study of long-term effects of a randomized trial of antenatal corticosteroids.We found that there were no differences in the between group relative risk in any of study outcomes between those exposed to betamethasone and those exposed to placebo when outcomes were assessed using different data sources.
The findings of this 50-year follow-up study align with those of a previous 30-year followup study of the same cohort that found no differences between treatment groups in rates of hypertension, systolic or diastolic blood pressure, diabetes mellitus, or fasting lipid concentrations [33].It is also consistent with a 20-year follow-up of another randomized trial of antenatal corticosteroids that found no differences between treatment groups in the rate of mental health disorders [34].
The advantage of a self-reported questionnaire is that it is useful for participants living overseas, provides a definitive outcome in one step, requires no record linkage nor extraction of outcome data after that linkage, and is free from the associated costs and delays of record linkage and data extraction.Additionally self-reported questionnaires do not involve the participants in additional laboratory tests nor physical clinic visits.However, self-reporting nonsevere conditions may be less reliable as participants might not be aware of their chronic conditions.
Administrative datasets used in our study included different types of information such as prescribed medications, laboratory test results, clinic attendance, and hospital admissions, making them reliable sources for outcome ascertainment.However, administrative data were not available for those living overseas, and laboratory results were only available for participants during the time they were residing in the northern geographic region.Given the advantages and disadvantages of each of these data sources, we found that either of them alone could be a useful source for outcome ascertainment, especially when data availability is not related to treatment assignment.
Using the self-reported questionnaire data alone yielded relative risks similar in magnitude and direction to those assessed using adjudicated outcomes for all study outcomes.Other studies have shown that patient-reported outcomes can serve as a useful tool for collecting outcome data [35,36].However, high rates of missing self-reported data can lead to an underestimation of outcomes and reduce the study's power to detect differences between treatment groups [37].In our study the completion rate for self-reported questionnaires was high, and this was the sole method of obtaining data for participants living overseas.The questionnaire also identified more cases of high blood pressure and mental health disorders than administrative datasets where the only administrative data available were pharmaceutical dispensing records, although adding administrative data to the questionnaire data increased the number of cases for all outcomes [29].For a severe outcome such as diabetes, participants may be more likely to be aware of their condition, leading to a lower likelihood of misclassification in self-reported data compared to administrative data and narrower confidence intervals for this outcome.While we have previously shown that the self-reported questionnaire underestimated all outcomes [29], this study confirmed that using the self-reported questionnaire alone to assess study outcomes would not alter the findings of the study on the long-term effects of antenatal corticosteroids on chronic conditions.
Similarly, using administrative data alone to assess differences between treatment groups revealed no significant differences in study outcomes, although the direction of relative risks for diabetes and pre-diabetes were reversed.This reversal appeared to occur because a small number of participants in one randomized group met the criteria for having the condition in administrative datasets, but were judged by the expert panel as being treated for other reasons (false positive) [29].Despite this, the confidence intervals showed substantial overlap when using administrative datasets alone and when using adjudication outcomes to assess study outcomes.However, a higher rate of misclassifying outcomes may introduce bias in other studies.We have also previously shown that administrative data underestimated outcomes incidence, but using administrative datasets alone to assess outcomes had no impact on the result of the study when compared with adjudicated outcomes [29].
We found that the assessment of outcomes using a combination of questionnaire and administrative data gave similar estimates of treatment effects to those assessed using adjudicated outcomes.A Cochrane systematic review of 47 randomized trials also reported that treatment effect estimates for outcome events assessed by adjudication committee did not differ from those assessed by onsite assessors [38].Similarly, a study investigating the effects of randomized blood pressure lowering treatment on recurrent stroke using investigator diagnosis and adjudication by committee reported that the adjudication process had no apparent impact on the study's conclusion and argued that excluding adjudication could reduce the cost of conducting clinical studies [39].Another study showed that routinely collected data could be used alone to assess serious vascular events in a follow-up study of myocardial infarction without the need for clinical adjudication [40].Since we have previously reported that only a small number of participants with diabetes, pre-diabetes, high blood pressure, and asthma identified using the combined data sources were not confirmed by the expert panel, our findings align with other studies suggesting that adjudication may be unnecessary in determining the outcomes of a randomized trial [29].
Using a combination of questionnaire and administrative data can have several benefits for outcome ascertainment, potentially improving the study's power and helping to minimize underestimation of the outcomes.When there is a difference in the incidence of study outcomes between the treatment groups, underestimation of the outcomes, for example by using a single data source for ascertainment, could introduce bias either toward or away from the null [41].In a meta-research study, which compared randomized trials using routinely collected data for outcome assessment versus traditional clinical trials, out of seven traditional clinical trials that reported statistically significant treatment benefits, three trials using routinely collected data showed no significant treatment benefit, three reported smaller treatment benefits (bias toward the null), and one showed a harmful effect of the treatment (bias away from the null) [10].However, when there is no difference in the incidence of conditions between the treatment and placebo groups, as in our study, underestimation of the outcome using one data source may not significantly affect the study's results but could reduce the precision of the effect estimate.This reduction in precision was evident in our study, with slightly wider confidence intervals for estimates of treatment effects when using either the selfreported questionnaire or administrative datasets alone, as compared to using adjudicated outcomes.

Strengths of the study
By utilizing both self-reported questionnaires and administrative data for outcome ascertainment, alongside high completion and consent rates for these data sources, our study was able to investigate the impact of utilizing these sources alone or in combination on trial findings.A high participation rate among those eligible for the study, coupled with unique identifiers for record linkage, were other strengths of our study.

Limitations of the study
Our study lacked in-person clinical assessments, which could serve as the gold standard for comparisons.Our study may also involve some selection bias, given the relatively low followup rate.However, the baseline demographic variables of those who were eligible were similar to those of participants who consented, suggesting differential bias between randomised groups is unlikely.
Future studies encompassing a broad range of outcomes, including major events in a wider age range, and incorporating in-person assessments as a gold standard could provide greater insight into assessing potential underestimation of outcomes by various methods and the impact of relying solely on self-reported questionnaires or administrative data in follow-up studies on the study findings.

Conclusion
We aimed to investigate whether using different data sources to assess outcomes could impact the findings of a follow-up study of a randomized trial.We found that using a self-reported questionnaire alone, administrative datasets alone, a combination of both, or all data sources with or without adjudication, for assessing diabetes, pre-diabetes, total diabetes, hyperlipidemia, high blood pressure, mental health disorders and asthma in a follow-up study of a randomized trial had no impact on the study conclusions.

Fig 1 .
Fig 1. Forest plot of adjusted relative risks for comparison of the incidence of a. diabetes, b. pre-diabetes and c. total diabetes between betamethasone and placebo groups assessed using different data sources.https://doi.org/10.1371/journal.pone.0308414.g001

Fig 2 .
Fig 2. Forest plot of adjusted relative risks for comparison of the incidence of a. hyperlipidemia, b. high blood pressure, c. mental health disorders, and d. asthma between betamethasone and placebo groups assessed using different data sources.https://doi.org/10.1371/journal.pone.0308414.g002

Table 4 . P-values for comparison of relative risks for study outcomes assessed using different data sources alone or in combination.
[31]lues calculated using the test of interaction[31]https://doi.org/10.1371/journal.pone.0308414.t004